Multi-agent Reinforcement Learning in Sequential Social Dilemmas

نویسندگان

Joel Z. Leibo

Vinícius Flores Zambaldi

Marc Lanctot

Janusz Marecki

Thore Graepel

چکیده

Matrix games like Prisoner’s Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Qnetwork, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inequity aversion resolves intertemporal social dilemmas

Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. Howe...

متن کامل

Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach

The Iterated Prisoner’s Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoner’s dilemmas, these choices are temporally extended and different strategies may correspond to sequences of actions, reflecting grades of cooperation. We introduce a Sequential Prisoner’s Dilemma (SPD) game to b...

متن کامل

Emotional Multiagent Reinforcement Learning in Social Dilemmas

Social dilemmas have attracted extensive interest in multiagent system research in order to study the emergence of cooperative behaviors among selfish agents. Without extra mechanisms or assumptions, directly applying multiagent reinforcement learning in social dilemmas will end up with convergence to the Nash equilibrium of mutual defection among the agents. This paper investigates the importa...

متن کامل

Strategic Foresighted Learning in Competitive Multi-Agent Games

We describe a generalized Q-learning type algorithm for reinforcement learning in competitive multi-agent games. We make the observation that in a competitive setting with adaptive agents an agent’s actions will (likely) result in changes in the opponents policies. In addition to accounting for the estimated policies of the opponents, our algorithm also adjusts these future opponent policies by...

متن کامل

Consequentialist Conditional Cooperation in Social Dilemmas with Imperfect Information

Social dilemmas, where mutual cooperation can lead to high payoffs but participants face incentives to cheat, are ubiquitous in multi-agent interaction. We wish to construct agents that cooperate with pure cooperators, avoid exploitation by pure defectors, and incentivize cooperation from the rest. However, often the actions taken by a partner are (partially) unobserved or the consequences of i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

نویسندگان

چکیده

منابع مشابه

Inequity aversion resolves intertemporal social dilemmas

Towards Cooperation in Sequential Prisoner's Dilemmas: a Deep Multiagent Reinforcement Learning Approach

Emotional Multiagent Reinforcement Learning in Social Dilemmas

Strategic Foresighted Learning in Competitive Multi-Agent Games

Consequentialist Conditional Cooperation in Social Dilemmas with Imperfect Information

عنوان ژورنال:

اشتراک گذاری